Steps taken before implementing PCA:
- Replaced < LOD with with the minimum observed value for each variable divided by 2. Note: after this calculation, some values go to infinity. They have been set to 0.
- Checked missingness in the data.
- Removed zero variance variables.
- Scaled and centered all variables to ensure that the criterion for finding linear combinations of the predictors is based on how much variation they explain and therefore improve numerical stability.
Notes about the following results:
- Number of components for each dataset has been chosen based on their respective scree plots.
- Importance has been calculated based on the contribution of a variable to each component. For example, if all variables would contribute equally to each component they would each take up 1/ncol(data). So, any variable that contributes more than 1/ncol(data) to a component can be considered as an important contributor to that component. The vertical line in each plot represents this threshold.
- Output data from PCA merged with bmid has been saved to “/data/KI/imic/results/pcaData/”.
VITAL
Proteomics data
Scree plot

Top 10 contributors: importance

Top 10 contributors: value
